Implementation of directed acyclic word graph
نویسنده
چکیده
An effective implementation of a Directed Acyclic Word Graph (DAWG) automaton is shown. A DAWG for a text T is a minimal automaton that accepts all substrings of a text T, so it represents a complete index of the text. While all usual implementations of DAWG needed about 30 times larger storage space than was the size of the text, here we show an implementation that decreases this requirement down to four times the size of the text. The method uses a compression of DAWG elements, i. e. vertices, edges and labels. The construction time of this implementation is linear with respect to the size of the text, a search for a specific pattern is done in a linear time with respect to the size of the pattern. This implementation preserves both good properties of the DAWG automaton.
منابع مشابه
Direct Construction of Compact Directed Acyclic Word Graphs
The Directed Acyclic Word Graph (DAWG) is an e cient data structure to treat and analyze repetitions in a text, especially in DNA genomic sequences. Here, we consider the Compact Directed Acyclic Word Graph of a word. We give the rst direct algorithm to construct it. It runs in time linear in the length of the string on a xed alphabet. Our implementation requires half the memory space used by D...
متن کاملOn Compact Directed Acyclic Word Graphs
The Directed Acyclic Word Graph (DAWG) is a space-e cient data structure to treat and analyze repetitions in a text, especially in DNA genomic sequences. Here, we consider the Compact Directed Acyclic Word Graph of a word. We give the rst direct algorithm to construct it. It runs in time linear in the length of the string on a xed alphabet. Our implementation requires half the memory space used...
متن کاملTernary Directed Acyclic Word Graphs
Given a set S of strings, a DFA accepting S offers a very time-efficient solution to the pattern matching problem over S. The key is how to implement such a DFA in the trade-off between time and space, and especially the choice of how to implement the transitions of each state is critical. Bentley and Sedgewick proposed an effective tree structure called ternary trees. The idea of ternary trees...
متن کاملOn-Line Construction of Compact Directed Acyclic Word Graphs
A Compact Directed Acyclic Word Graph (CDAWG) is a space–efficient text indexing structure, that can be used in several different string algorithms, especially in the analysis of biological sequences. In this paper, we present a new on–line algorithm for its construction, as well as the construction of a CDAWG for a set of strings.
متن کاملSparse Directed Acyclic Word Graphs
The suffix tree of string w is a text indexing structure that represents all suffixes ofw. A sparse suffix tree ofw represents only a subset of suffixes of w. An application to sparse suffix trees is composite pattern discovery from biological sequences. In this paper, we introduce a new data structure named sparse directed acyclic word graphs (SDAWGs), which are a sparse text indexing version ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Kybernetika
دوره 38 شماره
صفحات -
تاریخ انتشار 2002